Skip to main content

All Questions

0votes
0answers
12views

Isolation Forest sample size

I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features. To prevent any overfitting, what would you recommend to tune ...
Mar's user avatar
  • 85
4votes
1answer
54views

Unsupervised Isolation Forrest sklearn hyperparameters

I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...
Mar's user avatar
  • 85
-1votes
0answers
37views

ML model for Career Prediction

I am NOT able to figure out how to make a ML model. I have been chatgpting most of it and understanding the code, I'm doing next to nothing. No matter what code I input, the accuracy is always 0%... ...
Ananya Vijay's user avatar
2votes
1answer
44views

I can't get my R² above 70%

I tried RandomForest, LGBM, Knneighbors, Polynomial Regression as algorithm's and cross-validation, train test split and standard scaler, nothing seem's to get it past the 70% mark. The dataframe has ...
user178825's user avatar
1vote
1answer
45views

RFECV and grid search - what sets to use for hyperparameter tuning?

I am running machine learning models (all with sci-kit learn estimators, no neural networks) using a custom dataset with a number of features and binomial output. I first split the dataset into 0.6 (...
Alex's user avatar
1vote
1answer
48views

Manual Python Implementation of Stacking Model

I tried to build a Python class, CustomStackingClassifier(), to implement the Stacking method in ensemble machine learning. In this implementation, the output of the base classifiers is set to be the ...
CM_Li's user avatar
3votes
1answer
81views

Comparing clusterings from different datasets

I have 2 different data sets with essentially the same variables, though one is data from one year and the other is data from another year. I've run KModes on both data sets and now have some ...
ethqnol's user avatar
2votes
2answers
142views

Random Forest always predicting the majority class

I'm predicting disease outcome using biological data (metabolites plus covariates age, sex and BMI). The outcome is a binary variable and moderately imbalanced (~12% positive cases). I have a ...
be_nice's user avatar
0votes
0answers
34views

Is it possible to compute Davies Bouldin score from a precomputed distance matrix using sklearn?

I'm trying to compute the Davies Bouldin score to compare different clustering approach. I have a precomputed distance matrix (that represents edit-based distance between texts). I'm using the scikit-...
Tim's user avatar
0votes
1answer
79views

As an intermediate R programmer looking to dive into machine learning, should I choose Python or stick with R?

Background I am an intermediate R programmer with some experience in machine learning concepts and simple modeling in R. I have an opportunity to collaborate with a professional machine learning team ...
a.sa.5969's user avatar
0votes
0answers
271views

Correct method to report Randomized Search CV results

I have searched online but I still cannot find a definitive answer on how to "correctly" report the results from hyperparameter tuning a machine learning model; though, this may just be some ...
user167433's user avatar
-1votes
1answer
58views

label encoding & one hot encoding

I have read somewhere that label encoding is only used for target variable and then for the input features we can use one hot encoding (nominal ) and ordinal encoding( features having order). I am ...
Sofia Malik's user avatar
0votes
0answers
11views

Implementation of multi-classification meta-estimators in scikit-learn

In scikit-learn we have different methods to deal with multi-classification problems, below are some of the meta estimators used a. OneVsRestClassifier and ...
SOHAM SACHIN KULKARNI's user avatar
4votes
2answers
177views

Loss function in Isolation Forest

I have recently came across on this algorithm and was working on my graduation project. As per my understanding, we creates sub trees for each sub samples. Then we calculates the scores for each ...
Mayank Singh's user avatar
3votes
2answers
576views

How can I fit sklearn.svm.SVC with three features, given that the features are actually arrays of lengths 128, 12 and 40?

To clarify, each instance of feature_1 is a 128 item long array, each instance of feature_2 is a 12 item long array, and each instance of feature_3 is a 40 item long array. I am currently simply doing ...
Karn Varshneya's user avatar

153050per page
close